Scalable Density-Based Distributed Clustering
نویسندگان
چکیده
Clustering has become an increasingly important task in analysing huge amounts of data. Traditional applications require that all data has to be located at the site where it is scrutinized. Nowadays, large amounts of heterogeneous, complex data reside on different, independently working computers which are connected to each other via local or wide area networks. In this paper, we propose a scalable density-based distributed clustering algorithm which allows a user-defined trade-off between clustering quality and the number of transmitted objects from the different local sites to a global server site. Our approach consists of the following steps: First, we order all objects located at a local site according to a quality criterion reflecting their suitability to serve as local representatives. Then we send the best of these representatives to a server site where they are clustered with a slightly enhanced density-based clustering algorithm. This approach is very efficient, because the local determination of suitable representatives can be carried out quickly and independently from each other. Furthermore, based on the scalable number of the most suitable local representatives, the global clustering can be done very effectively and efficiently. In our experimental evaluation, we will show that our new scalable density-based distributed clustering approach results in high quality clusterings with scalable transmission cost.
منابع مشابه
Design and evaluation of two scalable protocols for location management of mobile nodes in location based routing protocols in mobile Ad Hoc Networks
Heretofore several position-based routing protocols have been developed for mobile ad hoc networks. Many of these protocols assume that a location service is available which provides location information on the nodes in the network.Our solutions decrease location update without loss of query success rate or throughput and even increase those.Simulation results show that our methods are effectiv...
متن کاملDesign and evaluation of two scalable protocols for location management of mobile nodes in location based routing protocols in mobile Ad Hoc Networks
Heretofore several position-based routing protocols have been developed for mobile ad hoc networks. Many of these protocols assume that a location service is available which provides location information on the nodes in the network.Our solutions decrease location update without loss of query success rate or throughput and even increase those.Simulation results show that our methods are effectiv...
متن کاملNG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data
We present NG-DBSCAN, an approximate density-based clustering algorithm that operates on arbitrary data and any symmetric distance measure. The distributed design of our algorithm makes it scalable to very large datasets; its approximate nature makes it fast, yet capable of producing high quality clustering results. We provide a detailed overview of the steps of NG-DBSCAN, together with their a...
متن کاملDynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture
Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کامل